Fred LaPolla
10/10/2023
In looking at gene expression data you may want to see a heatmap to see which timepoints/genes are being expressed. A heatmap is a quick way to look at large volumes of data by highlighting cells that have higher or lower values in them.
This data provided by Dr. Dolgalev is a table of RNA-seq normalized counts. These are expression values of all genes across different samples. There are wild-type and Tet2-knockout samples (4 of each).
For more details on how the table was generated, the steps are summarized here: http://bit.ly/snsdemo
## gene pan1wt pan2wt pan3ko pan4ko pan5ko pan6ko pan7wt pan8wt
## 1 0610005C13Rik 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## 2 0610006L08Rik 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## 3 0610009B22Rik 35.856 40.314 35.508 37.582 38.570 38.426 41.635 40.510
## 4 0610009E02Rik 1.434 0.000 0.000 0.817 0.000 2.175 0.000 1.870
## 5 0610009L18Rik 2.868 2.573 1.732 3.268 0.000 0.000 0.000 1.246
## 6 0610009O20Rik 74.581 60.041 45.900 44.936 57.855 58.001 47.691 84.759
## [1] "data.frame"
We can see column 1 is the gene name, column 2, 3 and 8 and 9 are the Wild Type samples, and columns 4:7 are the knock out samples.
## gene pan1wt pan2wt pan7wt pan8wt pan3ko pan4ko pan5ko pan6ko
## 1 0610005C13Rik 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## 2 0610006L08Rik 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## 3 0610009B22Rik 35.856 40.314 41.635 40.510 35.508 37.582 38.570 38.426
## 4 0610009E02Rik 1.434 0.000 0.000 1.870 0.000 0.817 0.000 2.175
## 5 0610009L18Rik 2.868 2.573 0.000 1.246 1.732 3.268 0.000 0.000
## 6 0610009O20Rik 74.581 60.041 47.691 84.759 45.900 44.936 57.855 58.001
dolgaMat <- as.matrix(dolgalevNormRNA[,2:9], "numeric")
rownames(dolgaMat)<- dolgalevNormRNA[,1]
head(dolgaMat)## pan1wt pan2wt pan7wt pan8wt pan3ko pan4ko pan5ko pan6ko
## 0610005C13Rik 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## 0610006L08Rik 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## 0610009B22Rik 35.856 40.314 41.635 40.510 35.508 37.582 38.570 38.426
## 0610009E02Rik 1.434 0.000 0.000 1.870 0.000 0.817 0.000 2.175
## 0610009L18Rik 2.868 2.573 0.000 1.246 1.732 3.268 0.000 0.000
## 0610009O20Rik 74.581 60.041 47.691 84.759 45.900 44.936 57.855 58.001
## [1] "matrix" "array"
## [1] "numeric"
## pan1wt pan2wt pan7wt pan8wt
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00 Median : 0.00 Median : 0.00
## Mean : 96.99 Mean : 112.84 Mean : 98.06 Mean : 82.22
## 3rd Qu.: 2.87 3rd Qu.: 2.57 3rd Qu.: 3.03 3rd Qu.: 3.12
## Max. :38552.42 Max. :83293.70 Max. :37493.25 Max. :103143.74
## pan3ko pan4ko pan5ko pan6ko
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00 Median : 0.00 Median : 0.00
## Mean : 117.37 Mean : 112.88 Mean : 102.06 Mean : 97.49
## 3rd Qu.: 1.73 3rd Qu.: 2.45 3rd Qu.: 3.21 3rd Qu.: 2.17
## Max. :76915.11 Max. :68016.87 Max. :54551.21 Max. :41122.76
knockOut <- dolgaMat[,5:8]
wildType <- dolgaMat[,1:4]
exp_genesAll <- names(which(rowSums(dolgaMat)>15000))
exp_genesKO <- names(which(rowSums(knockOut)>15000))
exp_genesWT <- names(which(rowSums(wildType)>15000))
dolgaExp <- dolgaMat[exp_genesAll,]
expressedKO <- knockOut[exp_genesKO,]
expressedWT <- wildType[exp_genesWT,]Convert the following data into a matrix, with rownames for the genes being named. Re-order the columns to group WT and KO mice strains together. Make subsets of highly expressed genes (choose your own threshold)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.6.2
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
##
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
## genomic data. Bioinformatics 2016.
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
## The automatically generated colors map from the 1^st and 99^th of the
## values in the matrix. There are outliers in the matrix whose patterns
## might be hidden by this color mapping. You can manually set the color
## to `col` argument.
##
## Use `suppressMessages()` to turn off this message.
## The automatically generated colors map from the 1^st and 99^th of the
## values in the matrix. There are outliers in the matrix whose patterns
## might be hidden by this color mapping. You can manually set the color
## to `col` argument.
##
## Use `suppressMessages()` to turn off this message.
## The automatically generated colors map from the 1^st and 99^th of the
## values in the matrix. There are outliers in the matrix whose patterns
## might be hidden by this color mapping. You can manually set the color
## to `col` argument.
##
## Use `suppressMessages()` to turn off this message.
Heatmap(dolgaExp, cluster_columns = FALSE, clustering_method_rows = "complete", show_row_names = FALSE)## The automatically generated colors map from the 1^st and 99^th of the
## values in the matrix. There are outliers in the matrix whose patterns
## might be hidden by this color mapping. You can manually set the color
## to `col` argument.
##
## Use `suppressMessages()` to turn off this message.
Heatmap(expressedKO, cluster_columns = FALSE, clustering_method_rows = "complete", show_row_names = FALSE)## The automatically generated colors map from the 1^st and 99^th of the
## values in the matrix. There are outliers in the matrix whose patterns
## might be hidden by this color mapping. You can manually set the color
## to `col` argument.
##
## Use `suppressMessages()` to turn off this message.
Heatmap(expressedWT, cluster_columns = FALSE, clustering_method_rows = "complete", show_row_names = FALSE)## The automatically generated colors map from the 1^st and 99^th of the
## values in the matrix. There are outliers in the matrix whose patterns
## might be hidden by this color mapping. You can manually set the color
## to `col` argument.
##
## Use `suppressMessages()` to turn off this message.
Try making a heatmap of the data you subsetted in part 1. Try both clustering by column and setting this setting to false. Explore the help information and see what other row clustering methods are available
We might look at a heatmap of the highly expressed genes. Explore cor() in RStudio, what does this command do? What is t()? Why are we taking the t() of our matrix?
Then we can take this figure, and using a command called hclust, we can pass a method we want the clustering to be done. dist means it is creating a “distance matrix” of the correlations of gene expression, which is a way of saying if we were to chart the values, how far would each be from one another. You can explor ways you can cluster by viewing ?dist. Then this
##The following is borrowed from Dr. Itai Yannai
##A correlation of the highly expressed genes in our set
C = cor(t(dolgaExp))
#then it becomes a heatmap
h <- Heatmap(C)
print(h)## We cluster the values in this set of gene expression using euclidian distance
## Basically on the backend, R is making a matrix of the values in C and then calculating
##How far they are from one another
h <- hclust(d=dist(C))
## Then we are choosing where to cut the dendrogram, here 6 "branches" or groups have
## been cut, you could also choose to cut the dendrogram by height using h instead of K
hc <- as.factor(cutree(h, k=6))
## Now we make a heatmap again using the order of the clustering
hh = ComplexHeatmap::Heatmap(C[h$order, h$order],cluster_rows=FALSE, cluster_columns=FALSE)
## Finally we add annotations to show the groups that we have made
an = HeatmapAnnotation(df = data.frame(hc[h$order]), which = 'row')
print(hh+an)We can see that our data is highly skewed so we may want to perform this on a log scale:
We can also use several methods of hierarchical clustering in our chart. To learn more about these try: ?hclust
## Warning: package 'RColorBrewer' was built under R version 4.0.5
Explore the R Color Brewer Palettes and try a different color palette for a heatmap.